Search CORE

30 research outputs found

Incremental Algorithms for Effective and Efficient Query Recommendation

Author: A. Cayci
D. Beeferman
P. Boldi
R. Baeza-Yates
R. Baeza-Yates
S. Muthukrishnan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract. Query recommender systems give users hints on possible in-teresting queries relative to their information needs. Most query rec-ommenders are based on static knowledge models built on the basis of past user behaviors recorded in query logs. These models should be pe-riodically updated, or rebuilt from scratch, to keep up with the possible variations in the interests of users. We study query recommender algo-rithms that generate suggestions on the basis of models that are updated continuously, each time a new query is submitted. We extend two state-of-the-art query recommendation algorithms and evaluate the effects of continuous model updates on their effectiveness and efficiency. Tests con-ducted on an actual query log show that contrasting model aging by con-tinuously updating the recommendation model is a viable and effective solution.

CiteSeerX

Crossref

Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

Author: A Névéol
A Rzhetsky
Andrey Rzhetsky
BM Fonseca
C Jacquemin
C Manning
CD Manning
D Beeferman
D Rebholz-Schuhmann
D Shotton
D Shotton
D Srikrishna
D Trieschnigg
Devabhaktuni Srikrishna
DR Hunter
GF Cooper
J Evans
J Lin
JPA Ionnidis
M Muin
M Weeber
Marc A. Coram
MH MacRoberts
MJ Schuemie
N Tran
O Bodenreider
P Srinivasan
PL Elkin
Q He
Q Li
R Islamaj Dogan
R Schifanella
RA DiGiacomo
S Bird
T Rindflesch
T Wachter
V Sintchenko
W Kim
Y Huang
Z Lu
Z Sun
Publication venue: Public Library of Science
Publication date: 14/09/2011
Field of study

Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Herb network construction and co-module analysis for uncovering the combination rule of traditional Chinese herbal formulae

Author: AA Borisy
AL Hopkins
B Patwardhan
BM Schmidt
Bo Zhang
CT Keith
CY Ung
D Beeferman
DJ Newman
Duo Jiang
EM Williamson
F Wang
GM Cragg
H Kitano
HG Brunner
J Folkman
L Wang
LS Adams
LS Li
MA van Driel
Ningbo Zhang
QH Xu
S Li
S Li
S Li
S Li
S Wang
Shao Li
SW Zhao
T Ma
TP Fan
W Huang da
X Wu
Yingying Wei
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Unity Is Strength: Coupling Media for Thematic Segmentation

Author: D. Beeferman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Improved Topic-Dependent Language Modeling Using Information Retrieval Techniques

Author: Doug Beeferman
Milind Mahajan
X. D. Huang
Publication venue
Publication date: 01/01/1999
Field of study

N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model. 1

CiteSeerX

Crossref

Parents’ online school reviews reflect several racial and socioeconomic disparities in K–12 education

Author: Beeferman D
Chu E
Eynon R
Gillani N
Roy D
Publication venue: 'Academy of Traumatology'
Publication date: 01/01/2021
Field of study

Parents often select schools by relying on subjective assessments of quality made by other parents, which are increasingly becoming available through written reviews on school ratings websites. To identify relationships between review content and school quality, we apply recent advances in natural language processing to nearly half a million parent reviews posted for more than 50,000 publicly funded U.S. K–12 schools on a popular ratings website. We find: (1) schools in urban areas and those serving affluent families are more likely to receive reviews, (2) review language correlates with standardized test scores—which generally track race and family income—but not school effectiveness, measured by how much students improve in their test scores over time, and (3) the linguistics of reviews reveal several racial and income-based disparities in K–12 education. These findings suggest that parents who reference school reviews may be accessing, and making decisions based on, biased perspectives that reinforce achievement gaps

Oxford University Research Archive

Optimal Mixture Models in IR

Author: D. Beeferman
M. Sipser
M. Sipser
S. E. Robertson
Publication venue
Publication date: 01/01/2002
Field of study

We explore the use of Optimal Mixture Models to represent topics. We analyze two broad classes of mixture models: set-based and weighted. We provide an original proof that estimation of set-based models is NP-hard, and therefore not feasible. We argue that weighted models are superior to set-based models, and the solution can be estimated by a simple gradient descent technique. We demonstrate that Optimal Mixture Models can be successfully applied to the task of document retrieval. Our experiments show that weighted mixtures outperform a simple language modeling baseline. We also observe that weighted mixtures are more robust than other approaches of estimating topical models

CiteSeerX

Crossref

A Statistical Model for Topic Segmentation and Clustering

Author: D. Beeferman
D.M. Blei
L. Pevzner
M. Steyvers
M.A. Hearst
M.M. Shafiei
W.L. Buntine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Intended boundaries detection in topic change tracking for text segmentation

Author: Alexandre Labadié
D. Beeferman
J. Morris
Larousse
M. A. Hearst
P. Roget
Violaine Prince
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Iterative Approach to Text Segmentation

Author: D. Beeferman
F.Y.Y. Choi
J. Eisenstein
J. Eisenstein
L. Pevzner
M. Utiyama
M.A. Hearst
M.A.K. Halliday
N. Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. We present divSeg, a novel method for text segmentation that iteratively splits a portion of text at its weakest point in terms of the connectivity strength between two adjacent parts. To search for the weakest point, we apply two different measures: one is based on language modeling of text segmentation and the other, on the interconnectivity between two segments. Our solution produces a deep and narrow binary tree – a dynamic object that describes the structure of a text and that is fully adaptable to a user’s segmentation needs. We treat it as a sep-arate task to flatten the tree into a broad and shallow hierarchy either through supervised learning of a document set or explicit input of how a text should be segmented. The rich structure of our created tree further allows us to segment documents at varying levels such as topic, sub-topic, etc. We evaluated our new solution on a set of 265 articles from Discover magazine where the topic structures are unknown and need to be discov-ered. Our experimental results show that the iterative approach has the potential to generate better segmentation results than several leading baselines, and the separate flattening step allows us to adapt the results to different levels of details and user preferences

CiteSeerX

Crossref